class: center, middle, inverse, title-slide .title[ # Introduction to Geospatial Techniques for Social Scientists in R ] .subtitle[ ## Applied Data Wrangling ] .author[ ### Stefan Jünger & Anne-Kathrin Stroppe ] .institute[ ###
GESIS Workshop
] .date[ ### June 07, 2022 ] --- layout: true --- ## Now <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Time </th> <th style="text-align:left;"> Title </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;color: gray !important;"> June 06 </td> <td style="text-align:left;color: gray !important;"> 10:00-11:30 </td> <td style="text-align:left;font-weight: bold;"> Introduction to GIS </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> June 06 </td> <td style="text-align:left;color: gray !important;"> 11:45-13:00 </td> <td style="text-align:left;font-weight: bold;"> Vector Data </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> June 06 </td> <td style="text-align:left;color: gray !important;color: gray !important;"> 13:00-14:00 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Fingerfood@GESIS </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> June 06 </td> <td style="text-align:left;color: gray !important;"> 14:00-15:30 </td> <td style="text-align:left;font-weight: bold;"> Mapping </td> </tr> <tr> <td style="text-align:left;color: gray !important;border-bottom: 1px solid"> June 06 </td> <td style="text-align:left;color: gray !important;border-bottom: 1px solid"> 15:45-17:00 </td> <td style="text-align:left;font-weight: bold;border-bottom: 1px solid"> Raster Data </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> June 07 </td> <td style="text-align:left;color: gray !important;"> 09:00-10:30 </td> <td style="text-align:left;font-weight: bold;"> Advanced Data Import & Processing </td> </tr> <tr> <td style="text-align:left;color: gray !important;background-color: yellow !important;"> June 07 </td> <td style="text-align:left;color: gray !important;background-color: yellow !important;"> 10:45-12:00 </td> <td style="text-align:left;font-weight: bold;background-color: yellow !important;"> Applied Data Wrangling & Linking </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> June 07 </td> <td style="text-align:left;color: gray !important;color: gray !important;"> 12:00-13:00 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Lunch Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> June 07 </td> <td style="text-align:left;color: gray !important;"> 13:00-14:30 </td> <td style="text-align:left;font-weight: bold;"> Investigating Spatial Autocorrelation </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> June 07 </td> <td style="text-align:left;color: gray !important;"> 14:45-16:00 </td> <td style="text-align:left;font-weight: bold;"> Spatial Econometrics & Outlook </td> </tr> </tbody> </table> --- ## What Are Georeferenced Data? .pull-left[ </br> Data with a direct spatial reference `\(\rightarrow\)` **geo-coordinates** - Information about geometries - Optional: Content in relation to the geometries ] .pull-right[ <img src="data:image/png;base64,#../img/fig_geometries.png" width="85%" style="display: block; margin: auto;" /> .tinyisher[Sources: OpenStreetMap / GEOFABRIK (2018), City of Cologne (2014), and the Statistical Offices of the Federation and the Länder (2016) / Jünger, 2019] ] --- ## Georeferenced Survey Data Survey data enriched with geo-coordinates (or other direct spatial references) </br> <img src="data:image/png;base64,#../img/geo_surveys.png" width="85%" style="display: block; margin: auto;" /> </br> .center[**With georeferenced survey data, we can analyze interactions between individual behaviors and attitudes and the environment.**] --- ## An Example Workflow .pull-left[ From the addresses to analyses with georeferenced survey data, several steps and challenges along the way. We will talk about: - Data Protection & Data Access - Geocoding - Spatial Data Linking - Applied Examples ] .pull-right[ <img src="data:image/png;base64,#../img/varreport.png" width="75%" style="display: block; margin: auto;" /> ] --- ## Data Protection </br> </br> That‘s one of the biggest issues - Explicit spatial references increase the risk of re-identifying anonymized survey respondents - Can occur during the processing of data but also during the analysis </br> .center[**Affects all phases of research and data management!**] --- ## Data Availability .pull-left[ Geospatial Data - Often de-centralized distributed - Fragmented data landscape, at least in Germany Georeferenced Survey Data - Primarily, survey data - Depends on documentation - Access difficult due to data protection restrictions ] .pull-right[ <img src="data:image/png;base64,#../img/data_availability.png" width="75%" style="display: block; margin: auto;" /> .right[.tinyisher[ https://www.eea.europa.eu/data-and-maps https://datasearch.gesis.org/ https://datasetsearch.research.google.com/ ]] ] --- ## Distribution & Re-Identification Risk Even without (in)direct spatial references, data may still be sensitive - Geospatial attributes add new information to existing data - May be part of general data privacy checks, but we may not distribute these data as is .pull-left[ Safe Rooms / Secure Data Centers - Control access - Checks output ] .pull-right[ <img src="data:image/png;base64,#../img/safe_room.png" width="825" style="display: block; margin: auto;" /> .right[.tinyisher[https://www.gesis.org/en/services/processing-and-analyzing-data/guest-research-stays/secure-data-center-sdc]] ] --- ## Legal Regulations in Data Processing .pull-left[ Storing personal information such as addresses in the same place as actual survey attributes is not allowed in Germany - Projects keep them in separate locations - Can only be matched with a correspondence table - Necessary to conduct data linking ] .pull-right[ <img src="data:image/png;base64,#../img/fig_linking_workflow_simple.png" width="949" style="display: block; margin: auto;" /> .right[.tinyisher[Jünger, 2019]] ] --- ## Geocoding Indirect spatial references have to be converted into direct spatial references `\(\rightarrow\)` Addresses to geo-coordinates Different service providers can be used, e.g., Google, Bing, OSM but raises questions of data protection and quality concerns. We rely on a service offered by the Federal Agency of Cartography and Geodesy (BKG): - Online interface and API for online geocoding - Offline geocoding possible based on raw data - But: Data and service are restricted --- ## `bkggeocoder` .pull-left[ R package `bkggeocoder` developed at GESIS for (offline) geocoding by Stefan and Jonas Lieth: - Access via [Github](https://github.com/StefanJuenger/bkggeocoder) - Introduction in the [Meet the Experts Talk](https://www.youtube.com/watch?v=ZnA21LyKK88&feature=youtu.be) by Stefan ] .pull-right[ </br> </br> <img src="data:image/png;base64,#../img/bkggeocoder.png" width="65%" style="display: block; margin: auto;" /> ] --- ## Spatial Linking .pull-left[ Geocoding tool retrieves automatically point coordinates, administrative unit keys and INSPIRE grid cell ids. Spatial joins based on coordinates for other units: - constituencies - administrative units across time (e.g., harmonized territorial status) ] .pull-right[ <img src="data:image/png;base64,#../img/fig_3d_.png" width="80%" style="display: block; margin: auto;" /> .tinyisher[Sources: OpenStreetMap / GEOFABRIK (2018), City of Cologne (2014), Leibniz Institute of Ecological Urban and Regional Development (2018), Statistical Offices of the Federation and the Länder (2016), and German Environmental Agency / EIONET Central Data Repository (2016) / Jünger, 2019] ] --- ## Data Linking Linking via ids most commonly used but comes with its own challenges (e.g., territorial status and land reforms? comparable units? heterogeneity within units?). <img src="data:image/png;base64,#../img/data_linking.png" width="75%" style="display: block; margin: auto;" /> --- ## Spatial Linking Methods (Examples) I .pull-left[ 1:1 .tinyisher[sf::st_join] <img src="data:image/png;base64,#../img/fig_linking_by_location_noise.png" width="75%" style="display: block; margin: auto;" /> ] .pull-right[ Distances .tinyisher[sf::st_distance] <img src="data:image/png;base64,#../img/fig_linking_distance_noise_appI.png" width="75%" style="display: block; margin: auto;" /> ] .tinyisher[Sources: German Environmental Agency / EIONET Central Data Repository (2016) and OpenStreetMap / GEOFABRIK (2018) / Jünger, 2019] --- ## Spatial Linking Methods (Examples) II .pull-left[ Filter methods .tinyisher[sf::st_filter or terra::vect(. , filter = )] <img src="data:image/png;base64,#../img/fig_linking_focal_immigrants.png" width="75%" style="display: block; margin: auto;" /> ] .pull-right[ Buffer zones .tinyisher[sf::st_buffer (combined with terra::vect())] <img src="data:image/png;base64,#../img/fig_linking_buffer_sealing.png" width="75%" style="display: block; margin: auto;" /> ] .tinyisher[Sources: Leibniz Institute of Ecological Urban and Regional Development (2018) and Statistical Offices of the Federation and the Länder (2016) / Jünger, 2019] --- ## Fake Research Question .pull-left[ Say we're interested in the impact of the current pandemic on individual well-being in the geographic context. We plan to conduct a survey in the state of North-Rhine Westphalia. ] .pull-right[ </br> <img src="data:image/png;base64,#../img/4iq3kg.jpg" width="813" style="display: block; margin: auto;" /> .center[.tinyisher[https://imgflip.com/memegenerator/Trump-Bill-Signing] ] ] --- ## Our Sample Area: NRW's Boundaries .pull-left[ ```r sampling_area <- osmdata::getbb( "Nordrhein-Westfalen", format_out = "sf_polygon" ) %>% .$multipolygon %>% sf::st_transform(3035) ``` ] -- .pull-right[ ```r tm_shape(sampling_area) + tm_borders() ``` <img src="data:image/png;base64,#2_2_Applied_Data_Wrangling_files/figure-html/nrw-map-1.png" style="display: block; margin: auto;" /> ] --- ## A Fake-Life Application .pull-left[ Let's sample 1,000 people to interview them about their lives. We can draw a fake sample this way and also add an identifier for the respondents: ```r set.seed(1234) ``` ```r fake_coordinates <- sf::st_sample(sampling_area, 1000) %>% sf::st_sf() %>% dplyr::mutate( id_2 = stringi::stri_rand_strings(10000, 10) %>% sample(1000, replace = FALSE) ) ``` ] -- .pull-right[ ```r tm_shape(sampling_area) + tm_borders() + tm_shape(fake_coordinates) + tm_dots() ``` <img src="data:image/png;base64,#2_2_Applied_Data_Wrangling_files/figure-html/map-osm-coordinates-1.png" style="display: block; margin: auto;" /> ] --- ## Correspondence Table As in any survey that deals with addresses, we need a correspondence table of the distinct identifiers. ```r correspondence_table <- dplyr::bind_cols( id = stringi::stri_rand_strings(10000, 10) %>% sample(1000, replace = FALSE), id_2 = fake_coordinates$id_2 ) correspondence_table ``` ``` ## # A tibble: 1,000 × 2 ## id id_2 ## <chr> <chr> ## 1 ubsHG5McEM ihkXs9ejBD ## 2 WN7Ih0Y5Rz bN2W1BpZKx ## 3 lmLdwfl3cu wAbDkMovWz ## 4 uq2Rb6Dj2w R4eIulul4z ## 5 y7eYFQSuP3 XOvF2ZuGg1 ## 6 UxERvtP2Kx EPuILKVeoq ## 7 67N3O8FPyO 39TfAAxmme ## 8 I0AUhXMPkD 2tolhpgrNl ## 9 41h2EPFU1S nGgofAl6iC ## 10 9YaVHR70jt 4sXgiH1ydA ## # ℹ 990 more rows ``` --- ## Conduct the Survey We ask respondents for some standard sociodemographics. But we also apply a new and highly innovative item score, called the Fake Corona Burden Score (FCBS) using the [`faux` package](https://cran.r-project.org/web/packages/faux/index.html). ```r fake_survey_data <- dplyr::bind_cols( id = correspondence_table$id, age = sample(18:100, 1000, replace = TRUE), gender = sample(1:2, 1000, replace = TRUE) %>% as.factor(), education = sample(1:4, 1000, replace = TRUE) %>% as.factor(), income = sample(100:10000, 1000, replace = TRUE), fcbs = secret_variable_we_are_hiding_from_you ) ``` --- ## Survey Data Structure ```r fake_survey_data ``` ``` ## # A tibble: 1,000 × 6 ## id age gender education income fcbs ## <chr> <int> <fct> <fct> <int> <dbl> ## 1 ubsHG5McEM 72 2 1 6061 64.5 ## 2 WN7Ih0Y5Rz 49 1 3 4548 61.7 ## 3 lmLdwfl3cu 84 1 4 6850 49.4 ## 4 uq2Rb6Dj2w 90 1 4 1186 59.4 ## 5 y7eYFQSuP3 88 2 2 5888 68.6 ## 6 UxERvtP2Kx 58 2 1 9210 51.3 ## 7 67N3O8FPyO 90 2 3 789 61.1 ## 8 I0AUhXMPkD 45 1 4 1925 48.3 ## 9 41h2EPFU1S 36 1 3 9587 35.7 ## 10 9YaVHR70jt 98 2 2 4455 28.8 ## # ℹ 990 more rows ``` --- ## What could explain our Fake Corona Burden Score? *Likelihood to meet people* > Higher district's population density, lower Fake Corona Burden Score. -- *Provision of health services* > Higher distance to closest hospital, higher Fake Corona Burden Score. -- *Possible language issues in health care communication* > Higher immigrant rate in the neighborhood, higher Fake Corona Burden Score. --- ## Population Density When all data sets are loaded, we reduced our sample to the area North Rhine Westphalia. ```r sampling_area_districts_enhanced <- # load district shapefile sf::read_sf("./data/VG250_KRS.shp") %>% # transform crs sf::st_transform(3035) %>% # some data cleaning dplyr::rename(district_id = AGS) %>% dplyr::select(district_id) %>% # left join attributes dplyr::left_join(. , readr::read_csv("./data/attributes_districts.csv"), by = "district_id" ) %>% # reduce to area of nrw: x lies within y sf::st_join(., sampling_area, join = sf::st_intersects, left = FALSE) ``` --- ## Calculate size of the area ```r # calculate area of districts # areas will always be calculated # in units according to the CRS sf::st_area(sampling_area_districts_enhanced) %>% head(4) ``` ``` ## Units: [m^2] ## [1] 1269653841 1991051810 797826703 694472630 ``` --- ## Population Density All left to do is a simple mutation. Let's pipe it! .pull-left[ ```r # calculation population density sampling_area_districts_enhanced <- sampling_area_districts_enhanced %>% # calculate area of districts (areas will always # be calculatednin units according to the CRS ) dplyr::mutate(area = sf::st_area(.)) %>% # change unit to square kilometers dplyr::mutate(area_km2 = units::set_units (area, km^2)) %>% # recode variable as numeric dplyr::mutate(area_km2 = as.numeric (area_km2)) %>% # calculate population density dplyr::mutate(pop_dens = population/ area_km2) ``` ] .pull-right[ <img src="data:image/png;base64,#2_2_Applied_Data_Wrangling_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> ] --- ## Aggregate Data Squeezing in a small example on data aggregation. If you do not have id information or additional shapefiles, you can rely on `st_combine(x)` , `st_union(x,y)` and `st_intersection(x,y)` to combine shapefiles, resolve borders and return the intersection of two shapefiles. .pull-left[ ```r # load data german_states <- sf::read_sf("./data/VG250_LAN.shp") %>% sf::st_transform(3035) %>% dplyr::filter(GF == 4) %>% dplyr::rename(state_id = AGS) %>% dplyr::select(state_id) german_districts <- sf::read_sf("./data/VG250_KRS.shp") %>% sf::st_transform(3035) %>% dplyr::rename(district_id = AGS) %>% dplyr::select(district_id) %>% dplyr::left_join(. , readr::read_csv("./data/attributes_districts.csv"), by = "district_id" ) ``` ] .pull-right[ ```r # the 'dplyr way' district_aggregated_by_state <- st_join(german_districts, german_states, join = sf::st_intersects, left = TRUE) %>% group_by(state_id) %>% dplyr::summarise(state_death_rate = sum(death_rate, na.rm = TRUE)) ``` ] --- ## Aggregate Data .pull-left[ ```r tm_shape(district_aggregated_by_state) + tm_polygons(col = "state_death_rate") ``` ] .pull-right[ <img src="data:image/png;base64,#2_2_Applied_Data_Wrangling_files/figure-html/show-aggregation-1.png" style="display: block; margin: auto;" /> ] --- ## Respondents in Districts We have population density on the district level. Since our analysis focuses on the individual-level, we can spatial join the information to our fake respondents' coordinates. ```r # join back spatial_information <- sampling_area_districts_enhanced %>% # keeping just the variables we want dplyr::select(district_id, pop_dens) %>% # since we want to join district to # respondent defining coordintes first sf::st_join(fake_coordinates, # district data second . , # some points may lie on the border # choosing intersects therefore join = sf::st_intersects) %>% # drop our coordinates for data protection sf::st_drop_geometry() ``` --- ## Respondents in Districts ```r head(spatial_information, 5) ``` ``` ## id_2 district_id pop_dens ## 1 ihkXs9ejBD 05166 529.7002 ## 2 bN2W1BpZKx 05570 210.3911 ## 3 wAbDkMovWz 05958 132.1548 ## 4 R4eIulul4z 05570 210.3911 ## 5 XOvF2ZuGg1 05966 187.3264 ``` --- # Distance Calculation `sf::st_distance()` will calculate between **all** respondents and **all** hospitals resulting in a matrix with 1,786,000 objects (1,000 respondent * 1,786 hospitals). We can make our lives a little bit easier by treating this matrix as a `tibble`. .pull-left[ ```r # distances between each respondent # and each hospital distance_matrix <- # point layer "distance from" sf::st_distance( fake_coordinates, # point layer "distance to" sampling_area_hospitals, # dense matrix with all # pairwise distance by_element = FALSE ) %>% # making life a little bit easier dplyr::as_tibble() # check our matrix # again, units = CRS units! distance_matrix ``` ] .pull-right[ ``` ## # A tibble: 1,000 × 1,786 ## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 ## [m] [m] [m] [m] [m] [m] [m] [m] [m] [m] [m] [m] [m] ## 1 4.42e5 4.42e5 4.25e5 4.23e5 4.25e5 4.25e5 4.27e5 4.28e5 4.26e5 4.24e5 4.25e5 4.28e5 4.28e5 ## 2 3.21e5 3.21e5 2.90e5 2.89e5 2.90e5 2.90e5 2.93e5 2.94e5 2.91e5 2.89e5 2.90e5 2.93e5 2.93e5 ## 3 3.93e5 3.93e5 3.57e5 3.56e5 3.58e5 3.58e5 3.61e5 3.61e5 3.59e5 3.57e5 3.58e5 3.60e5 3.60e5 ## 4 3.26e5 3.26e5 2.94e5 2.93e5 2.95e5 2.95e5 2.98e5 2.98e5 2.95e5 2.94e5 2.95e5 2.97e5 2.97e5 ## 5 4.17e5 4.17e5 3.78e5 3.78e5 3.79e5 3.79e5 3.83e5 3.83e5 3.80e5 3.78e5 3.79e5 3.82e5 3.81e5 ## 6 3.81e5 3.81e5 3.50e5 3.49e5 3.51e5 3.51e5 3.54e5 3.54e5 3.51e5 3.49e5 3.50e5 3.53e5 3.53e5 ## 7 3.99e5 3.99e5 3.59e5 3.58e5 3.60e5 3.60e5 3.63e5 3.63e5 3.61e5 3.59e5 3.60e5 3.62e5 3.62e5 ## 8 3.95e5 3.95e5 3.62e5 3.60e5 3.62e5 3.62e5 3.65e5 3.65e5 3.63e5 3.61e5 3.62e5 3.65e5 3.64e5 ## 9 4.00e5 4.00e5 3.79e5 3.77e5 3.79e5 3.79e5 3.81e5 3.82e5 3.79e5 3.78e5 3.79e5 3.82e5 3.81e5 ## 10 4.65e5 4.65e5 4.38e5 4.36e5 4.38e5 4.38e5 4.41e5 4.41e5 4.39e5 4.37e5 4.38e5 4.41e5 4.40e5 ## # ℹ 990 more rows ## # ℹ 1,773 more variables: V14 [m], V15 [m], V16 [m], V17 [m], V18 [m], V19 [m], V20 [m], ## # V21 [m], V22 [m], V23 [m], V24 [m], V25 [m], V26 [m], V27 [m], V28 [m], V29 [m], ## # V30 [m], V31 [m], V32 [m], V33 [m], V34 [m], V35 [m], V36 [m], V37 [m], V38 [m], ## # V39 [m], V40 [m], V41 [m], V42 [m], V43 [m], V44 [m], V45 [m], V46 [m], V47 [m], ## # V48 [m], V49 [m], V50 [m], V51 [m], V52 [m], V53 [m], V54 [m], V55 [m], V56 [m], ## # V57 [m], V58 [m], V59 [m], V60 [m], V61 [m], V62 [m], V63 [m], V64 [m], V65 [m], … ``` ] --- ## Find Minimum Distance That's all there is concerning the "spatial" part of our data wrangling. From now on, just good old data crunching to get our distance to the closest hospital. .pull-left[ ```r distance_closest <- distance_matrix %>% # from unit to numeric dplyr::mutate_all(as.numeric) %>% # identify for each row the minimum # & save in variable dplyr::mutate(dist_closest_hospital = (apply(., 1, min))) %>% # get kilometer instead of meter dplyr::mutate(dist_closest_hospital = dist_closest_hospital/1000 ) %>% # select only column # containing smallest distance dplyr::select(dist_closest_hospital) ``` ] .pull-right[ ``` ## # A tibble: 1,000 × 1 ## dist_closest_hospital ## <dbl> ## 1 1.23 ## 2 8.08 ## 3 5.98 ## 4 5.27 ## 5 10.9 ## 6 0.921 ## 7 12.8 ## 8 3.72 ## 9 4.80 ## 10 3.00 ## # ℹ 990 more rows ``` ] --- ## Join to our spatial information! I prefer to work with kilometers rather than meters. And I want to add our new variable to the other spatial information we already prepared. Luckily, I know that the spatial information table has the same length and order as the fake coordinates. .pull-left[ ```r spatial_information <- distance_closest %>% # bind columns with spatial information # only bind with other data set than the # original coordinates when you are 100 # percent sure it's same length and order! dplyr::bind_cols(spatial_information, .) ``` ] .pull-right[ ``` ## id_2 district_id pop_dens dist_closest_hospital ## 1 ihkXs9ejBD 05166 529.7002 1229.4851 ## 2 bN2W1BpZKx 05570 210.3911 8078.3521 ## 3 wAbDkMovWz 05958 132.1548 5975.1750 ## 4 R4eIulul4z 05570 210.3911 5265.9808 ## 5 XOvF2ZuGg1 05966 187.3264 10889.7503 ## 6 EPuILKVeoq 05978 725.1611 921.4407 ``` ] --- ## Immigrant Rate Buffers ...and we're not yet done: we still need the immigrant rate in the neighborhood. Let's calculate buffers of 500 meters and add their mean values to our dataset. .pull-left[ ```r # download data & create rate immigrants_nrw <- z11::z11_get_100m_attribute(STAATSANGE_KURZ_2) %>% terra::crop(. , sampling_area) inhabitants_nrw <- z11::z11_get_100m_attribute(Einwohner) %>% terra::crop(. , sampling_area) immigrant_rate <- immigrants_nrw * 100 / inhabitants_nrw ``` ] .pull-right[ ```r # calculate immigrant rate for 500m buffer immigrant_buffers <- terra::extract( immigrant_rate, fake_coordinates %>% sf::st_buffer(500) %>% terra::vect(), fun = mean, na.rm = TRUE, ID = FALSE, raw = TRUE ) # spatially link with buffers on the fly spatial_information <- spatial_information %>% dplyr::mutate(immigrant_buffers = immigrant_buffers[[2]]) ``` ] --- ## Join with Fake Burden Score I hope you're not tired to join data tables. Since we care a tiny bit more about data protection than others, we have yet another joining task left: joining the information we received using our (protected) fake coordinates to the actual survey data via the correspondence table. .pull-left[ ```r # last joins for now fake_survey_data_spatial <- # first join the id dplyr::left_join( correspondence_table, spatial_information, by = "id_2" ) %>% # drop the fake_coordinate id dplyr::select(-id_2) %>% # join the survey data dplyr::left_join( fake_survey_data, by = "id" ) ``` ] .pull-right[ <img src="data:image/png;base64,#2_2_Applied_Data_Wrangling_files/figure-html/correlation-plot-1.png" width="75%" style="display: block; margin: auto;" /> ] --- class: middle ## Exercise 2_2_1: Spatial Joins [Exercise](https://stefanjuenger.github.io/gesis-workshop-geospatial-techniques-R-2023/exercises/2_2_1_Spatial_Joins.html) [Solution](https://stefanjuenger.github.io/gesis-workshop-geospatial-techniques-R-2023/exercises/2_2_1_Spatial_Joins.html) --- class: middle ## Addon-slides: Example Studies --- ## Environmental inequalities (Jünger, 2021) > Is income associated with fewer environmental disadvantages, and are there differences between German people and people with a migration background? .pull-left[ .small[ Theoretical Framework - Social and Ethnic Inequalities (Crowder & Downey, 2010) - Place Stratification (Lersch, 2013) Data - GGSS 2016 & 2018 - soil sealing & green spaces ] ] .pull-right[ <img src="data:image/png;base64,#../img/fig_linking_buffer_sealing.png" width="65%" style="display: block; margin: auto;" /> .tinyisher[Leibniz Institute of Ecological Urban and Regional Development (2018) / Jünger, 2019] ] --- ## Results <img src="data:image/png;base64,#../img/FIGURE_2.png" width="70%" style="display: block; margin: auto;" /> .tinyisher[Data source: GGSS 2016 & 2018; N = 6,117; 95% confidence intervals based on cluster-robust standard errors (sample point); all models control for age, gender, education, household size, german region and survey year interaction, inhabitant size of the municipality, and distance to municipality administration] --- ## Attitudes towards minorities (Jünger & Schaeffer, 2022) > Do people who live in ethnic homogenous neighborhoods that are close to ethnic diverse ones have more negative attitudes towards minorities? .pull-left[ .small[ Theoretical Framework - Contact Theory (Allport, 1954) - Ethnic Competition (Stephan et al., 2009) Data - GGSS 2016 - German Census 2011 ] ] .pull-right[ <img src="data:image/png;base64,#../img/Abb1.png" width="65%" style="display: block; margin: auto;" /> .tinyisher[German Census 2011, OpenStreetMap / Jünger & Schaeffer, 2022] ] --- ## Results <img src="data:image/png;base64,#../img/Abb2.png" width="70%" style="display: block; margin: auto;" /> .tinyisher[Data source: GGSS 2016; N = 1,689; 95% confidence intervals based on cluster-robust standard errors (sample point); all models control for age, gender, education, income, unemployment, homeownership, immigrants and inhabitants in the neighborhood, inhabitant size of the municipality, german region] --- ## Left Behind by the State? (Stroppe, Forthcoming) > Are political trust levels affected by the accessibility of public services and infrastructures for citizens? .pull-left[ .small[ Theoretical Framework - Political Performance-Trust Link (Easton 1965, Hetherington 2005) - Context condition low-intensity information cue (Cho & Rudolph 2008) Data - GGSS 2018 - hospital, school, train station (distance measures) - municipality data ] ] .pull-right[ <br> <img src="data:image/png;base64,#../img/meandist_trains.PNG" width="65%" style="display: block; margin: auto;" /> .tinyisher[Federal Statistical Office 2019, Deutsche Bahn 2017 and GeoBasis-DE / BKG 2022 / Stroppe, 2023] ] --- ## Results <br> <img src="data:image/png;base64,#../img/fig1_coefplot_colored.png" width="95%" style="display: block; margin: auto;" /> .tinyisher[Data source: GGSS 2018 and Federal Statistical Office 2017. N = 3030, Groups = 152 (Municipalities). Fitted Models: OLS multi-level random effect models. Individual-level controls: income, gender, education, age, personal trust, political interest. Municipality level controls: population density and unemployment. Dependent variable: Trust in government. Survey weights are applied.] --- layout: false class: center background-image: url(data:image/png;base64,#../assets/img/the_end.png) background-size: cover .left-column[ </br> <img src="data:image/png;base64,#../img/Anne.png" width="75%" style="display: block; margin: auto;" /> ] .right-column[ .left[.small[<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M464 64H48C21.49 64 0 85.49 0 112v288c0 26.51 21.49 48 48 48h416c26.51 0 48-21.49 48-48V112c0-26.51-21.49-48-48-48zm0 48v40.805c-22.422 18.259-58.168 46.651-134.587 106.49-16.841 13.247-50.201 45.072-73.413 44.701-23.208.375-56.579-31.459-73.413-44.701C106.18 199.465 70.425 171.067 48 152.805V112h416zM48 400V214.398c22.914 18.251 55.409 43.862 104.938 82.646 21.857 17.205 60.134 55.186 103.062 54.955 42.717.231 80.509-37.199 103.053-54.947 49.528-38.783 82.032-64.401 104.947-82.653V400H48z"></path> </svg> [anne-kathrin.stroppe@gesis.org](mailto:anne-kathrin.stroppe@gesis.org)] .small[<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"></path> </svg> [`@astroppe`](https://twitter.com/stroppann)] .small[<svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path> </svg> [`stroppann`](https://github.com/stroppann)] .small[<svg viewBox="0 0 576 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M280.37 148.26L96 300.11V464a16 16 0 0 0 16 16l112.06-.29a16 16 0 0 0 15.92-16V368a16 16 0 0 1 16-16h64a16 16 0 0 1 16 16v95.64a16 16 0 0 0 16 16.05L464 480a16 16 0 0 0 16-16V300L295.67 148.26a12.19 12.19 0 0 0-15.3 0zM571.6 251.47L488 182.56V44.05a12 12 0 0 0-12-12h-56a12 12 0 0 0-12 12v72.61L318.47 43a48 48 0 0 0-61 0L4.34 251.47a12 12 0 0 0-1.6 16.9l25.5 31A12 12 0 0 0 45.15 301l235.22-193.74a12.19 12.19 0 0 1 15.3 0L530.9 301a12 12 0 0 0 16.9-1.6l25.5-31a12 12 0 0 0-1.7-16.93z"></path> </svg> [`NA`](NA)]] ]